LSA learner sentence comprehension in agglutinative and non-agglutinative languages
نویسندگان
چکیده
This work has been carried out in the context of automatic evaluation of learner summaries where text comprehension is gained using Latent Semantic Analysis (LSA) and Natural Language Processing (NLP) techniques. We had intuitively observed that lemmatized versions of LSA matrixes resembled better human Basque similarity judgement than the non lemmatized ones. This research was conducted to test this idea comparing the impact of lemmatization in an agglutinative vs. a non-agglutinative language (Basque and Spanish respectively) when modelling semantic similarity. Parallel Basque-Spanish corpora replicate the same semantic knowledge in both languages. The reason to compare these parallel corpora was to observe how close or related the obtained results were in these two morphologically diverse languages. Lemmatized and non-lemmatized LSA measures have been compared to human judgements.
منابع مشابه
Morphological Development in the Interlanguage of English Learners of Xhosa
This study investigates the development of morphology in the interlanguage of English learners of Xhosa. A quasi-longitudinal research design is used to trace development in the oral interlanguage of six learners of Xhosa for a period of eight months. The elicitation tasks employed range from fairly unstructured conversation tasks to highly structured sentence-manipulation tasks. The learners h...
متن کاملThe Production of Nominal and Verbal Inflection in an Agglutinative Language: Evidence from Hungarian
The contrast between regular and irregular inflectional morphology has been useful in investigating the functional and neural architecture of language. However, most studies have examined the regular/irregular distinction in non-agglutinative Indo-European languages (primarily English) with relatively simple morphology. Additionally, the majority of research has focused on verbal rather than no...
متن کاملUtilizing Agglutinative Features in Japanese-Uighur Machine Translation
Japanese and Uighur languages are agglutinative languages and they have many syntactical and morphological similarities. And roughly speaking, we can translate Japanese into Uighur sequentially by replacing Japanese words with corresponding Uighur ones after morphological analysis. However, we should translate agglutinated suffixes carefully to make correct translation, because they play import...
متن کاملJoint PoS Tagging and Stemming for Agglutinative Languages
The number of word forms in agglutinative languages is theoretically infinite and this variety in word forms introduces sparsity in many natural language processing tasks. Part-of-speech tagging (PoS tagging) is one of these tasks that often suffers from sparsity. In this paper, we present an unsupervised Bayesian model using Hidden Markov Models (HMMs) for joint PoS tagging and stemming for ag...
متن کاملTurkish LVCSR: Database Preparation and Language Modeling for an Agglutinative Language
Turkish language is an agglutinative language. It is possible to produce a very high number of words from the same root with suffixes [1]. Language modeling for agglutinative languages needs to be different than modeling of languages like English. Such languages also have inflections but not as many as an agglutinative language. Techniques which can be used for modeling agglutinative languages ...
متن کامل